Goto

Collaborating Authors

 decision hyperplane


Geometry of naturalistic object representations in recurrent neural network models of working memory

Neural Information Processing Systems

Working memory is a central cognitive ability crucial for intelligent decision-making. Recent experimental and computational work studying working memory has primarily used categorical (i.e., one-hot) inputs, rather than ecologically-relevant, multidimensional naturalistic ones.





Geometry of naturalistic object representations in recurrent neural network models of working memory

arXiv.org Artificial Intelligence

Working memory is a central cognitive ability crucial for intelligent decision-making. Recent experimental and computational work studying working memory has primarily used categorical (i.e., one-hot) inputs, rather than ecologically relevant, multidimensional naturalistic ones. Moreover, studies have primarily investigated working memory during single or few cognitive tasks. As a result, an understanding of how naturalistic object information is maintained in working memory in neural networks is still lacking. To bridge this gap, we developed sensory-cognitive models, comprising a convolutional neural network (CNN) coupled with a recurrent neural network (RNN), and trained them on nine distinct N-back tasks using naturalistic stimuli. By examining the RNN's latent space, we found that: (1) Multi-task RNNs represent both task-relevant and irrelevant information simultaneously while performing tasks; (2) The latent subspaces used to maintain specific object properties in vanilla RNNs are largely shared across tasks, but highly task-specific in gated RNNs such as GRU and LSTM; (3) Surprisingly, RNNs embed objects in new representational spaces in which individual object features are less orthogonalized relative to the perceptual space; (4) The transformation of working memory encodings (i.e., embedding of visual inputs in the RNN latent space) into memory was shared across stimuli, yet the transformations governing the retention of a memory in the face of incoming distractor stimuli were distinct across time. Our findings indicate that goal-driven RNNs employ chronological memory subspaces to track information over short time spans, enabling testable predictions with neural data.


HyperVQ: MLR-based Vector Quantization in Hyperbolic Space

arXiv.org Artificial Intelligence

The success of models operating on tokenized data has led to an increased demand for effective tokenization methods, particularly when applied to vision or auditory tasks, which inherently involve non-discrete data. One of the most popular tokenization methods is Vector Quantization (VQ), a key component of several recent state-of-the-art methods across various domains. Typically, a VQ Variational Autoencoder (VQVAE) is trained to transform data to and from its tokenized representation. However, since the VQVAE is trained with a reconstruction objective, there is no constraint for the embeddings to be well disentangled, a crucial aspect for using them in discriminative tasks. Recently, several works have demonstrated the benefits of utilizing hyperbolic spaces for representation learning. Hyperbolic spaces induce compact latent representations due to their exponential volume growth and inherent ability to model hierarchical and structured data. In this work, we explore the use of hyperbolic spaces for vector quantization (HyperVQ), formulating the VQ operation as a hyperbolic Multinomial Logistic Regression (MLR) problem, in contrast to the Euclidean K-Means clustering used in VQVAE. Through extensive experiments, we demonstrate that hyperVQ performs comparably in reconstruction and generative tasks while outperforming VQ in discriminative tasks and learning a highly disentangled latent space.


Confronting Discrimination in Classification: Smote Based on Marginalized Minorities in the Kernel Space for Imbalanced Data

arXiv.org Artificial Intelligence

The class imbalance problem is a classic classification problem, which arises because the number of negative samples (i.e., majority class) in the data set is much larger than the number of positive samples (i.e., minority class)[4]. This type of problem is common in many fields. For example, in the field of financial fraud, the occurrence of occasional small-probability fraud will cause huge economic losses. Therefore, accurately identifying positive samples will be the key to the class imbalance problem. The first difficulty in the class imbalance problem is mainly due to the rarity of positive samples, which has two connotations[2]: One is absolutely rare, which makes the data not representative enough and has a lot of noise; the other is relatively rare, which causes the feature space to overlap seriously, making it hard for the model to accurately separate the two classes. The second reason is the potential discrimination toward positive samples by current mainstream classifiers. Many current models treat the majority and minority classes equally when evaluating classification accuracy, resulting in the direction of model evaluation being naturally biased towards the majorities; the third reason is the potential discrimination toward important samples in positive samples by the oversampling model. SMOTE, as a classic oversampling method to solve class imbalance[1], only selects the data randomly when expanding the minorities, which may result in more serious feature space overlap because of the ignoration of important samples in minorities. To solve the various problems mentioned above, we propose a hierarchical Smote Based on Marginalized Minorities(MM-SMOTE). First, we use the basic SVM classifier to roughly classify the data, and obtain the support vectors in minorities as important samples for sampling; then assign weights to those support vectors based on their distance to the decision hyperplane; and then based on the k-nearest neighbors of support vectors, we used an adaptive oversampling to generate synthetic samples; finally, synthetic samples are used to augment the original kernel function of the basic SVM to form a new classifier.


Debiased Self-Training for Semi-Supervised Learning

arXiv.org Artificial Intelligence

Deep neural networks achieve remarkable performances on a wide range of tasks with the aid of large-scale labeled datasets. Yet these datasets are time-consuming and labor-exhaustive to obtain on realistic tasks. To mitigate the requirement for labeled data, self-training is widely used in semi-supervised learning by iteratively assigning pseudo labels to unlabeled samples. Despite its popularity, self-training is well-believed to be unreliable and often leads to training instability. Our experimental studies further reveal that the bias in semi-supervised learning arises from both the problem itself and the inappropriate training with potentially incorrect pseudo labels, which accumulates the error in the iterative self-training process. To reduce the above bias, we propose Debiased Self-Training (DST). First, the generation and utilization of pseudo labels are decoupled by two parameter-independent classifier heads to avoid direct error accumulation. Second, we estimate the worst case of self-training bias, where the pseudo labeling function is accurate on labeled samples, yet makes as many mistakes as possible on unlabeled samples. We then adversarially optimize the representations to improve the quality of pseudo labels by avoiding the worst case. Extensive experiments justify that DST achieves an average improvement of 6.3% against state-of-the-art methods on standard semi-supervised learning benchmark datasets and 18.9%$ against FixMatch on 13 diverse tasks. Furthermore, DST can be seamlessly adapted to other self-training methods and help stabilize their training and balance performance across classes in both cases of training from scratch and finetuning from pre-trained models.


Large-Margin Classification in Hyperbolic Space

arXiv.org Machine Learning

Representing data in hyperbolic space can effectively capture latent hierarchical relationships. With the goal of enabling accurate classification of points in hyperbolic space while respecting their hyperbolic geometry, we introduce hyperbolic SVM, a hyperbolic formulation of support vector machine classifiers, and elucidate through new theoretical work its connection to the Euclidean counterpart. We demonstrate the performance improvement of hyperbolic SVM for multi-class prediction tasks on real-world complex networks as well as simulated datasets. Our work allows analytic pipelines that take the inherent hyperbolic geometry of the data into account in an end-to-end fashion without resorting to ill-fitting tools developed for Euclidean space.